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Abstract — The field of compressed sensing has shown that a 
sparse but otherwise arbitrary vector can be recovered exactly 
from a small number of randomly constructed linear projections 
(or samples). The question addressed in this paper is whether 
an even smaller number of samples is sufficient when there 
exists prior knowledge about the distribution of the unknown 
vector, or when only partial recovery is needed. An information- 
theoretic lower bound with connections to free probability theory 
and an upper bound corresponding to a computationally simple 
thresholding estimator are derived. It is shown that in certain 
cases (e.g. discrete valued vectors or large distortions) the number 
of samples can be decreased. Interestingly though, it is also shown 
that in many cases no reduction is possible. 

I. Introduction 

Suppose that an unknown vector x of length n is observed 
using a set of linear projections y = Ax where A is a known 
mxn sampling matrix. The field of compressed sensing (see 
references in [ 1 1) has shown that if x is sparse (i.e. has 
a relatively small number of nonzero elements) then exact 
recovery is possible even if the number of samples m is 
much less than the vector length n. A great deal of work has 
considered necessary and sufficient conditions on the sampling 
matrix A with respect to various recovery goals. In particular, 
much of this work has focused on sufficient conditions for 
computationally efficient recovery algorithms. 

Typically, the conditions on the sampling matrix are remark- 
ably general with respect to the unknown vector x in the sense 
that they require no assumptions about the values or locations 
of the nonzero elements. Moreover, many of the results still 
apply even if x is not actually sparse, but instead has a sparse 
representation with respect to a known basis. 

In many practical situations however, there exists prior 
knowledge about the values of the nonzero elements. In this 
paper, we address the extent to which this additional infor- 
mation allows for recovery using an even smaller number of 
samples than are needed in the general "compressed sensing" 
setting. We focus exclusively on recovery of the support 
set (i.e. the locations of the nonzero elements) in the high 
dimensional setting and ask the following two questions: 
• What if we consider approximate support recovery?. 
In Section [Til] we show that if the sampling matrix A 
is designed with knowledge of the basis in which x 
is sparse, then there exists a natural tradeoff between 
accuracy and the number of samples. Conversely, if the 
sampling matrix is designed independently of the sparse 
basis, then no such tradeoff is possible. 



• What if x is a random vector with a known distribution ? 
If the distribution is discrete, then it is straightforward 
to see that only one sample is needed. In Section [IV] we 
consider general distributions, and our main results (The- 
orems 03 and O show that knowledge of the distribution 
may or may not decrease the number of samples that are 
needed depending on the desired distortion and various 
properties of the distribution such as the differential 
entropy. 

An additional contribution of this paper is given by the proof 
of our main lower bound (Section [V) which uses results from 
free probability theory to characterize the limiting distributions 
of certain random matrices that occur frequently in compressed 
sensing. 

A number of related works have addressed various bounds 
on the asymptotic sampling rate needed for the noisy setting 
|2|-|12|. In these cases, it is clear that properties such as 
the size of the smallest nonzero values dramatically affect the 
number of samples that are needed. The noiseless setting ad- 
dressed in the paper however, gives insight about fundamental 
limitations of the sampling process that cannot be overcome 
simply by increasing the signal to noise ratio. 

II. Problem Setup 

We consider a generalized sparsity model where an un- 
known vector x E R™ is assumed to have a sparse repre- 
sentation u e W l with respect to a known orthonormal basis 
B e R nxn given by 

x = Bu. 

The support s C {1, 2, • • • , n} is the set of integers indexing 
the nonzero elements of u, 

s := {i : Ui ^ 0}, 

and the sparsity k = |s| is the number of nonzero elements. 

The vector of samples y £ W" is expressed in terms of a 
sampling matrix A £ R mxn : 

y = Ax. 

Throughout this paper, we assume that an estimator is given 
the set (y, A, B, k) and the goal is to recover the support s of 
the sparse representation u. The distortion between a support 
s and its estimate s is measured using the Hamming distance 

d(s,s) := |sUs| - |s n s|. 



This paper focuses on whether or not a given recovery task 
is possible using an m x n sampling matrix A. One possible 
requirement is that A be good uniformly for all possible k- 
sparse vectors. However, this paper considers a less stringent 
requirement and instead asks if there exists a distribution pa 
such that recovery is possible, with high probability, for any 
fc-sparse vector u when A ~ pa is a random matrix drawn 
independently of u, and possibly also B. 

To highlight the difference between the above requirements 
it is useful to consider the task of exact recovery. Then, it 
can be shown that there exists a sampling matrix A satisfying 
the first requirement if and only if m > min(2fc,n), whereas 
there exists a distribution pa satisfying the second requirement 
if and only if m > min(fc + 1, n). 

To characterize the number of samples that are needed, we 
focus on the high dimensional setting where the vector length 
n becomes large. We assume that for each n, the sparsity is 
given by k n = [fl ■ n\ for some known sparsity rate ft E 
(0, 1). The following definitions are used to characterize the 
asymptotic sampling rate given by p = m n /n. 

Definition 1. The general source X n (£l) outputs an arbitrary 
(non-random) vector x € K™ and basis B E M. nxn where 
x = Bu for some vector u E M™ whose support s has size 

k=[n-n\. 

Given any support estimator s(y, A, B, k) and any distribu- 
tion pa, the probability that the fraction of errors exceeds the 
normalized distortion a E [0, 1] for the general source X n (Q) 
is given by 

P< n ' = inf Pr(d(s,s(y,A,B,fc),) > a-k\. 
(x,B)e*™(n) I v ' ) 

Definition 2. A sampling rate distortion pair (p, a) is said 
to be achievable for a source X if for each integer n there 
exists an estimator s(y, A, B,k) and a distribution pa on a 
\p ■ n\ x n sampling matrix such that 

P e (n) as n -)• oo. 

The sampling rate distortion function p(a) is the infimum of 
rates p > such that the pair (p, a) is achievable. 

III. Arbitrary Signals 

This section considers the sampling rate distortion function 
p(a) of the general source X(fl,F) for two different restric- 
tions on the sampling matrix. 

Definition 3. A random sampling matrix A is said to be 
universal if it is drawn independently of the basis B, and 
basis-specific otherwise. 

One useful property of a universal sampling matrix is that 
the sampling matrix can be constructed without knowledge 
of the sparse basis. Recovery with respect to a basis-specific 
matrix, however, is equivalent to assuming the the basis is 
the identity matrix (i.e. B = I) since any target matrix Ao 
designed for this setting can be applied to a general basis B 
by using the sampling matrix A = AqB^ 1 . The following 



result shows that the universal and basis-specific settings are 
the same when exact recovery is required but significantly 
different when a nonzero distortion is allowed. 

Proposition 1. The sampling rate distortion function p{a) of 
the general source X(tt) is given by 

| fi, if A ~ pa is universal ^ 

1 (1 — a)Q, if A ~ Pa\b lS basis-specific 
for a < 1 and is equal to zero otherwise. 

Proof Sketch: If the basis is known, then a "rate sharing" 
strategy may be employed to convexify the achievable rate dis- 
tortion region. Roughly speaking, this corresponds to ignoring 
some randomly chosen subset of the elements of u by placing 
zeros in the corresponding columns of the matrix AB. In the 
universal setting, however, this strategy is not possible. A full 
proof is given in ifTTI . ■ 
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Fig. 1. Comparison of the normalized sampling rate distortion function 
p(a)/Q of the general source X(Q) as a function of the distortion a for the 
universal and basis-specific settings. 

IV. Random Signals 

So far, we have considered the recovery of arbitrary vectors 
and the results have been mostly algebraic. In this section, we 
consider recovery of random vectors. We focus exclusively on 
the universal setting where the sampling matrix A must be 
designed independently of the sparse basis B. 

Definition 4. The random source X n (Tl, F) outputs a random 
vector X e R" and basis B e M" x " where X = BU for 
a random vector U E K" whose support S is distributed 
uniformly over all possibilities of size k = [fl ■ n\ and whose 
nonzero elements {Ui : i E S} are i.i.d. ~ F. The basis B is 
distributed uniformly over the set of all orthonormal matrices 
and is independent of U. 

We assume throughout that F denotes the distribution of 
a real valued random variable with finite power and zero 
probably mass at zero. Also, the definitions of achievability are 
the same as for the general source, except that the probability 
of error is taken with respect to the random vector X and 
random basis B, 

Pj n) =Pr{d(S,s(Y,A,B,fc)) >a-fc}. (2) 



A. Lower Bounds 

This section gives an information theoretic lower bound on 
the sampling rate distortion function p(a) of a random source 
X(0, F). To begin, we note that in some cases, the constraints 
imposed by the distribution F significantly alter the nature of 
the estimation problem. 

Proposition 2 (Discrete Signals). Suppose that the distribution 
F is supported on a discrete and finite set £ C K\{0}. Then, 
only m = 1 sample is sufficient for exact recovery, and the 
sampling rate distortion function p(a) of the random source 
X(0, F) is p{a) = for all a. 

Proof: Suppose that A is an 1 x n "matrix" whose ele- 
ments are drawn i.i.d. from continuous distribution with finite 
power. Then, with probability one, the projection u H> ABu 
maps each of the possible realizations of u to a unique 

real number. ■ 
The fact that only one sample is needed for discrete distri- 
butions is not due to the sparsity in the problem (after all, the 
result does not depend on the sparsity rate O) and Proposition 
|2] provides little insight into cases where the unknown signal 
may have a density. To address these cases, we introduce the 
following property of a random signal source. 

Definition 5. Given any distribution F with a density and any 
sparsity rate the function 9(0, F) 6 [0, 1] is given by 



9(0, F) 



(27re)- 1 exp(2/i(F)) 
a| + (l-ft)/4 ' 



(3) 



where pf, crp, and h(F) denote the mean, variance and 
differential entropy of the distribution F. If F does not have 
a density, then 9(0, F) = 0. 

The property 9(0, F) is the normalized entropy power of 
the nonzero elements and is equal to one if and only if F 
is a zero mean Gaussian distribution. Roughly speaking, one 
may interpret 6(0,, F) as the relative "distance" between a 
random source X(0,F) and a discrete source. The following 
result, which is proved in Section [V] uses this property to 
lower bound the sampling rate distortion function. 

Theorem 1 (Lower Bound). A sampling rate distortion pair 
(p,a) is not achievable for the random source X(0,F) if 
p < O and 



log 



1 



A( P ) \ 
9(0, F) A(p/0)J 



< H(0) - H(aO) (4) 



where 9(0,, F) is given by Definition^ H(p) = — plog(p) 
(1 — p) log(l — p) is binary entropy and 



A(r) 



'(l-r) 1_1/r ifr<l 
1 if r = 1 



(5) 



One consequence of Theorem Q] is that there is a simple test 
to see whether or not the sampling rate needed for a random 
source X(0,F) is any less than that needed for the general 
source X(0). 



Corollary 1 (Theorem [T]). The sampling rate distortion func- 
tion p(a) of the random source X(0, F) is given by p(a) = O 
for all a < 1 such that 

9(0,F) > A(0)exp(-£[H(0) - H(aO)]) . (6) 

B. Upper Bounds 

Theorem [T] shows that in many cases the sampling rate 
distortion function of a random source is equal to that of the 
arbitrary source. However, if 9(0, F) is less than the right 
hand side of (|6), then the lower bound in Theorem Q] is less 
than the sparsity rate O and there exists a gap with the upper 
bound given by the arbitrary setting (Proposition [TJ. In this 
section, we investigate improved (i.e. lower) upper bounds for 
these settings. 

One way to upper bound p(a) is to directly analyze the es- 
timator that minimizes the error probability Pe given in (fJJ. 
Although non-asymptotic properties of optimal estimation in 
the Gaussian setting have been studied (see for example fl3l ), 
analysis in the asymptotic setting appears to be challenging. 

In this paper, we instead derive upper bounds for a com- 
putationally simple, and potentially suboptimal, estimator de- 
scribed below. 

Definition 6. Suppose that the distribution of a random 
variable X is given by 



X 



w, 



if Z = 



W + ^fpU, \iZ=\ 



where U ~ F, W ~ Af(Q,OE[U 2 }), and Z - Bernoulli(fi) 
are independent. For any subset T C R let Zt(x) = 1(x E T) 
and define the error probability 



e(p,0,F) = inf Pi{Z T (X) ^ Z}. 

TCI 



(7) 



Definition 7. For a random source X(0, F), the Thresholding 
(TH) estimator sth(v) is given by 



s TH (y) = { l : u»er} 



(8) 



where u = B T A T y e M™ and T* C R minimizes the right 
hand side of (0 with p = m/n. 

The thresholding estimate corresponds to a separate hypoth- 
esis test for each element of x and its complexity is linear in 
the vector length n. 

Proposition 3. Suppose that for each integer n, the elements 
of the sampling matrix A are i.i.d. ~ A/"(0, l/n). Then, for 
any random source X(0, F) and sampling rate p, 

rf(SxH, S) — > e(p, O, F) in probability as n — > oo 

where e(p, O, F) is given by (|7). 

Proof Sketch: The key step, which is proved in lfl2ll . 
is to show that the empirical distributions of {Ui,i £ S} and 
{Ui, i ^ S} converge to the distribution of the random variable 
X described in Definition 6 conditioned on the events Z = 1 
and Z = respectively. ■ 



Combining Propositions Q] and [3] gives the following result 
which is complementary to Theorem Q] 

Theorem 2 (Upper Bound). A sampling rate distortion pair 
{p,a) is achievable for the random source X(fl, F) if p > fi 
or afi > e(p, fi, F) where e(p, fi, F) is given by @. 

C. A Gaussian Example 

This section illustrates the bounds in Theorems Q] and [2] for 
a random source X($l, F) where F is a Gaussian distribution 
with mean p and variance 1 — p? . In Figure [2] the normalized 
sampling rate distortion function p(a)/fi is plotted as a 
funciton of the mean p for a = 0.3. It is shown that if 
P < P* ~ 0.83 then the number of samples needed is no 
different than for the arbitrary source A"(fi). signals. However, 
if p > p*, there exists a gap between the bounds. 

In Figure [3] the same bounds are shown for the relatively 
large distortion a = 0.95. In this case, the upper bound from 
Theorem[2]is less than the rate needed for the arbitrary source, 
which verifies that, in some cases, there is a reduction in the 
number of samples that are needed. We note that the special 
case p = 1 corresponds to a discrete distribution, and thus 
p(a) = by Proposition [2] 

Moderate Distortion (a = 0.3) 
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Fig. 2. Bounds on the normalized sampling rate p/f! needed to achieve 
distortion a = 0.3 as a function of the distribution mean n when the sparsity 
rate is Q = 0.35 and the nonzero signal elements are i.i.d. Af(p,, 1 — p 2 ). 



High Distortion (a = 0.95) 




Fig. 3. Bounds on the normalized sampling rate p/Q. needed to achieve 
distortion a = 0.95 as a function of the distribution mean fi when the sparsity 
rate is fl = 0.35 and the nonzero signal elements are i.i.d. Af(ji, 1 — /i 2 ). 



V. Proof of TheoremQ] 

Throughout this proof we use the notation A interchange- 
ably to denote either a particular m n x n matrix A n or a 
sequence of such matrices {A„}. We begin with the following 
lemma which shows that the sampling rate distortion function 
can be lower bounded by considering an arbitrary sequence 
A. 

Lemma 1. Let A denote any sequence of full rank \p-n \ x n 
sampling matrices. Then, for any distortion a, the sampling 
rate distortion pair (p, a) is not achievable for the random 
source X(Q, F) if 



limsup i/(AX; S|B) < H(Q.) - H(aCl). 



(9) 



Proof Sketch: The lower bound for a given sequence A 
follows from Fano's inequality (see e.g. ifTTI '). The fact that 
the bound for one matrix A applies to any other matrix A' (of 
equal rank) follows from that fact that there exists an invertible 
matrix D (based on the singular value decomposition) such 
that DAX is equal in distribution to A'X. ■ 
Next, we upper bound the left hand side of (O. Expanding 
the mutual information for a given problems size n gives 

/(AX; S|B) = h(AX\B) - h(AX\S, B). 

The entropy h(AX\Ti) is upper bounded by the entropy of a 
Gaussian vector with the same covariance as AX, and thus 

h(AX\B) < f log(2Trecr 2 x \AA T \^) 

where a\ = Qap+Cl(l— is the variance of each element 
of X. Furthermore, the entropy /i(AX|S, B) is lower bounded 
by the entropy power inequality |[T4ll as 

/i(AX|S,B) > ^Elog(27re7V(F)|AB s Bf A T \^) 

where N(F) = (27re)~ 1 exp(2h(F)) is the entropy power of 
each nonzero element of U. Combining these bounds gives 



7(AX;S|B) < f Elog 



\AA 



e(n,F) 



|iAB s B|^| 



where we use the fact that 9(F, fi) = QN(F)/cr%.. 

Without any loss of generality, we may assume that the 
spectral distribution of AA T converges to a compactly sup- 
ported probability measure p as n — > oo. Then, |AA T |™ — j. 
G u as n —> oo where 



\og(x)dp{x). 



The remaining problem, therefore, is to characterize the 
spectral distribution of the random matrix ABsBg A T as n 
becomes large. To this end, it is convenient to use results from 
free probability theory which is a theory for non-commutative 
probability theory developed by Voiculescu lfl5l . To begin, 
observe that the limiting spectral distribution of A T A has a 
point mass 6q of weight 1 — p at zero and is given by 

p, = (1 - p)S + pp. 



Observe also, that the limiting spectral distribution of BgBs 
is given by 

// = (i - p/n)s + {p/n)Sx 

The basic idea from free probability is that the sequences 
A T A and Bg Bg are freely independent and hence the spectral 
distribution of Bg A T AHs converges to a probability measure 
that can be described uniquely in terms of p and p! . 

To characterize this measure, we use the following defini- 
tion. The i?-transform of a probability measure p is given by 

R»(*) = s-\-z) - \ 

where S~ 1 (z) denotes the inverse (with respect to the com- 
position of functions) of the Stieltjes transform, 



-dp{x) 



The following result follows directly from Section 4.4 of 
Speicher's lecture on free probability |16|. 

Lemma 2. If the limiting spectral distribution of A T A is equal 
to p, then the limiting spectral distribution of the random 
matrix ^Bg A T AHs is equal to v almost surely where 



R i >(z) = R ii (nz). 



(10) 



From Lemma |2] we conclude that spectral density of [3] 
ABsBg A T converges to v as n —> oo, where 



v = (p/n - 1)6 + (n/p)P. 

If v is compactly supported then |^/!BsBgyl T |'^ — > G v 
almost surely as n — > oo. Thus we conclude that 

limsu P /(AX;S|B)< f lo g (^.|) (11) 

almost surely for any compactly supported probability mea- 
sures p, v that satisfy Equation ( TTOb . 

Although the strongest bound corresponds to the minimiza- 
tion over p, such optimization appears to be difficult. Instead, 
we obtain a (potentially suboptimal) bound by setting p equal 
to the Marcenko-Pastur law [17] with parameter p, i.e. 



dp(x) = 



y/(x-a){b-x) 
2npx 



for all x G [a, b] where a = (1 — ^fp) 2 and b = (1 + ^fp) 2 ■ 
Then, it can be shown that ( TTOb is satisfied when v is equal 
to the Marcenko-Pastur law with parameter p/ft. Integrating 
with respect to these measures shows that 

G, = e- 1 A( /? ) 
G v = e-'Aip/n) 

which completes the proof. 

We remark that convergence of spectral density to the 
Marcenko-Pastur law corresponds to the setting where the 
elements of A are i.i.d. zero mean Gaussian. Interestingly, 
it is possible to use the rotational invariance of the Gaussian 
distribution to obtain the same bound given above, without 



appealing to free probability. However, the approach taken 
above is more general and allows the calculation of the bound 
in terms of other limiting distributions. 

VI. Discussion 

Two insights from the field of compressed sensing are 
that any sparse vector can be sampled efficiently using linear 
projections, and that there exist random sampling matrices that 
good almost surely for any sparse basis. In this paper, we have 
investigated what happens if a probability measure is placed 
on the set of possible vectors and partial recovery is allowed 
by bounding the sampling rate distortion function p(a). In 
certain cases, we showed that the number of samples may be 
decreased. However, we also showed that in many cases, no 
reduction is possible, particularly if one requires universality 
with respect a sparse basis. 
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